Import data and package
Loading required package: Rcpp
Loading 'brms' package (version 2.22.0). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').
Attaching package: 'brms'
The following object is masked from 'package:stats':
ar
Attaching package: 'survival'
The following object is masked from 'package:brms':
kidney
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Data preparation
The outcome of interest is 5-year overall mortality.
limit_FU <- 2
df_analysis <- read_rds ("../results/df_SHC_pm25.rds" ) %>%
mutate (ttd = ifelse (ttd> limit_FU,NA ,ttd),
binary_outcome = as.numeric (! is.na (ttd)),
ttfu = ifelse (ttfu> limit_FU,limit_FU,ttfu),
stage = case_when (stage %in% c ("Stage I" , "Stage II" , "Stage III" )~ "early" ,
stage %in% c ("Stage IV" )~ "advanced" )) %>%
select (binary_outcome,ttfu,
sex,age,race,ms,hist,
stage,ph,surgery,radiation,
immuno,chemo,contains ("pred" )) %>%
filter (! is.na (surgery) & ! is.na (chemo)) %>%
mutate (race = fct_na_value_to_level (race,level = "Unknown" ),
ms = fct_na_value_to_level (ms,level= "Unknown" ),
stage = factor (stage,levels= c ("early" ,'advanced' ),
labels = c ("early" ,'advanced' )),
stage = fct_na_value_to_level (stage,level= "Unknown" ))
glm (binary_outcome~ pred_pm_mean_naive+ sex+ age+
race+ ms+ hist+ stage+ ph+ surgery+ radiation+ immuno+ chemo,
family= binomial,
data= df_analysis) %>%
summary ()
Call:
glm(formula = binary_outcome ~ pred_pm_mean_naive + sex + age +
race + ms + hist + stage + ph + surgery + radiation + immuno +
chemo, family = binomial, data = df_analysis)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.122117 0.343959 -6.170 6.84e-10 ***
pred_pm_mean_naive 0.034139 0.018660 1.830 0.067324 .
sexMale 0.267760 0.074978 3.571 0.000355 ***
age 0.019794 0.003275 6.043 1.51e-09 ***
raceAsian/Pacific Islander -0.388221 0.089786 -4.324 1.53e-05 ***
raceBlack/African -0.207207 0.190314 -1.089 0.276259
raceHispanic 0.130554 0.144989 0.900 0.367885
raceAmerican Indian/Native 0.655696 1.152217 0.569 0.569307
raceOther 0.010094 0.156038 0.065 0.948421
raceUnknown -0.144515 0.198938 -0.726 0.467574
msMarried -0.080859 0.108628 -0.744 0.456659
msDivorced, separated, or widowed -0.062797 0.128206 -0.490 0.624267
msOther 0.114333 0.365962 0.312 0.754723
msUnknown 1.297038 0.187188 6.929 4.24e-12 ***
histAD -0.627130 0.106663 -5.880 4.11e-09 ***
histLC -0.253873 0.336083 -0.755 0.450017
histNSCLC_NOS -0.276294 0.158399 -1.744 0.081108 .
histOTH -0.210859 0.146510 -1.439 0.150092
histSC 0.400513 0.179167 2.235 0.025390 *
stageadvanced 1.220211 0.089229 13.675 < 2e-16 ***
stageUnknown 0.235970 0.175561 1.344 0.178920
ph 0.343926 0.133113 2.584 0.009774 **
surgery -1.435236 0.113853 -12.606 < 2e-16 ***
radiation -0.072812 0.078934 -0.922 0.356294
immuno -0.364101 0.121174 -3.005 0.002658 **
chemo 0.071546 0.082407 0.868 0.385285
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 5822.3 on 4390 degrees of freedom
Residual deviance: 4707.0 on 4365 degrees of freedom
AIC: 4759
Number of Fisher Scoring iterations: 4
df_analysis %>%
filter (binary_outcome== 0 ) %>%
select (ttfu) %>%
unlist () -> df_censoring
summary (df_censoring)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0137 2.0000 2.0000 1.8409 2.0000 2.0000
Bayesian modeling
We use the brms package to fit a logistic regression to the data. The measurement error can be easily incorporated in this framework using the me() syntax.
num_core <- 8
num_chain <- 8
num_iter <- 5000
brms_control <- list (adapt_delta= 0.9 ,stepsize= 0.1 ,max_treedepth= 20 )
set.seed (2025 )
brms_formula_naive_noME <- as.formula (binary_outcome ~
pred_pm_mean_naive +
sex+ age+ race+ ms+ hist+ stage+ ph+
surgery+ radiation+ immuno+ chemo)
fit_naive_noME <- brm (brms_formula_naive_noME,
data= df_analysis,
family= bernoulli (),
cores= num_core,
chain= num_chain,
iter= num_iter,
control = brms_control)
Compiling Stan program...
brms_formula_adjusted_noME <- as.formula (binary_outcome ~
pred_pm_mean_adjusted +
sex+ age+ race+ ms+ hist+ stage+ ph+
surgery+ radiation+ immuno+ chemo)
fit_adjusted_noME <- brm (brms_formula_adjusted_noME,
data= df_analysis,
family= bernoulli (),
cores= num_core,
chain= num_chain,
iter= num_iter,
control= brms_control)
Compiling Stan program...
Start sampling
brms_formula_naive <- as.formula (binary_outcome ~
me (pred_pm_mean_naive,pred_pm_sd_naive) +
sex+ age+ race+ ms+ hist+ stage+ ph+
surgery+ radiation+ immuno+ chemo)
fit_naive <- brm (brms_formula_naive,
data= df_analysis,
family= bernoulli (),
cores= num_core,
chain= num_chain,
iter= num_iter,
control= brms_control)
Compiling Stan program...
Start sampling
brms_formula_adjusted <- as.formula (binary_outcome ~
me (pred_pm_mean_adjusted,pred_pm_sd_adjusted) +
sex+ age+ race+ ms+ hist+ stage+ ph+
surgery+ radiation+ immuno+ chemo)
fit_adjusted <- brm (brms_formula_adjusted,
data= df_analysis,
family= bernoulli (),
chain= num_core,
cores= num_chain,
iter= num_iter,
control= brms_control)
Compiling Stan program...
Start sampling
Summary and diagnostic plots
Family: bernoulli
Links: mu = logit
Formula: binary_outcome ~ pred_pm_mean_naive + sex + age + race + ms + hist + stage + ph + surgery + radiation + immuno + chemo
Data: df_analysis (Number of observations: 4391)
Draws: 8 chains, each with iter = 5000; warmup = 2500; thin = 1;
total post-warmup draws = 20000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept -2.14 0.34 -2.80 -1.48 1.00 25726
pred_pm_mean_naive 0.03 0.02 -0.00 0.07 1.00 27584
sexMale 0.27 0.08 0.12 0.42 1.00 25653
age 0.02 0.00 0.01 0.03 1.00 26624
raceAsianDPacificIslander -0.39 0.09 -0.56 -0.22 1.00 25197
raceBlackDAfrican -0.21 0.19 -0.59 0.16 1.00 25582
raceHispanic 0.13 0.15 -0.16 0.42 1.00 28288
raceAmericanIndianDNative 0.76 1.21 -1.54 3.24 1.00 27637
raceOther 0.01 0.16 -0.30 0.32 1.00 27038
raceUnknown -0.15 0.20 -0.54 0.25 1.00 28195
msMarried -0.08 0.11 -0.30 0.14 1.00 19331
msDivorcedseparatedorwidowed -0.06 0.13 -0.31 0.19 1.00 19399
msOther 0.10 0.37 -0.63 0.83 1.00 25826
msUnknown 1.31 0.19 0.95 1.68 1.00 21727
histAD -0.63 0.11 -0.84 -0.42 1.00 17908
histLC -0.26 0.34 -0.94 0.40 1.00 25031
histNSCLC_NOS -0.28 0.16 -0.59 0.03 1.00 20826
histOTH -0.21 0.15 -0.50 0.08 1.00 19429
histSC 0.41 0.18 0.05 0.76 1.00 21261
stageadvanced 1.23 0.09 1.05 1.41 1.00 23301
stageUnknown 0.23 0.18 -0.12 0.58 1.00 24428
ph 0.35 0.13 0.08 0.61 1.00 30145
surgery -1.44 0.11 -1.67 -1.22 1.00 21690
radiation -0.07 0.08 -0.23 0.08 1.00 22893
immuno -0.37 0.12 -0.61 -0.13 1.00 26610
chemo 0.07 0.08 -0.09 0.23 1.00 26945
Tail_ESS
Intercept 16337
pred_pm_mean_naive 15639
sexMale 15790
age 16989
raceAsianDPacificIslander 16380
raceBlackDAfrican 14350
raceHispanic 15727
raceAmericanIndianDNative 14594
raceOther 15701
raceUnknown 16215
msMarried 15977
msDivorcedseparatedorwidowed 15540
msOther 16351
msUnknown 17102
histAD 16115
histLC 16039
histNSCLC_NOS 15784
histOTH 16204
histSC 16855
stageadvanced 16373
stageUnknown 14386
ph 15934
surgery 16470
radiation 16185
immuno 14647
chemo 16326
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot (fit_naive_noME,type= "trace" )
plot (fit_naive_noME,variable= "b_pred_pm_mean_naive" )
summary (fit_adjusted_noME)
Family: bernoulli
Links: mu = logit
Formula: binary_outcome ~ pred_pm_mean_adjusted + sex + age + race + ms + hist + stage + ph + surgery + radiation + immuno + chemo
Data: df_analysis (Number of observations: 4391)
Draws: 8 chains, each with iter = 5000; warmup = 2500; thin = 1;
total post-warmup draws = 20000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept -2.13 0.35 -2.81 -1.45 1.00 26554
pred_pm_mean_adjusted 0.03 0.02 -0.00 0.07 1.00 28146
sexMale 0.27 0.08 0.12 0.42 1.00 29133
age 0.02 0.00 0.01 0.03 1.00 27556
raceAsianDPacificIslander -0.39 0.09 -0.57 -0.21 1.00 26843
raceBlackDAfrican -0.21 0.19 -0.59 0.16 1.00 27519
raceHispanic 0.13 0.14 -0.15 0.41 1.00 29045
raceAmericanIndianDNative 0.73 1.23 -1.60 3.22 1.00 28782
raceOther 0.01 0.16 -0.30 0.32 1.00 27616
raceUnknown -0.15 0.20 -0.55 0.25 1.00 27827
msMarried -0.08 0.11 -0.29 0.13 1.00 18339
msDivorcedseparatedorwidowed -0.06 0.13 -0.31 0.19 1.00 19913
msOther 0.10 0.38 -0.64 0.82 1.00 26047
msUnknown 1.31 0.19 0.94 1.69 1.00 22904
histAD -0.63 0.11 -0.84 -0.42 1.00 18715
histLC -0.26 0.34 -0.93 0.39 1.00 26160
histNSCLC_NOS -0.28 0.16 -0.59 0.03 1.00 21679
histOTH -0.21 0.15 -0.50 0.08 1.00 20811
histSC 0.41 0.18 0.06 0.76 1.00 21806
stageadvanced 1.23 0.09 1.06 1.40 1.00 23602
stageUnknown 0.23 0.18 -0.11 0.58 1.00 27109
ph 0.35 0.13 0.08 0.60 1.00 29206
surgery -1.44 0.11 -1.67 -1.22 1.00 25146
radiation -0.07 0.08 -0.23 0.08 1.00 26827
immuno -0.37 0.12 -0.61 -0.13 1.00 30330
chemo 0.07 0.08 -0.09 0.24 1.00 25828
Tail_ESS
Intercept 16004
pred_pm_mean_adjusted 16244
sexMale 15256
age 17025
raceAsianDPacificIslander 15701
raceBlackDAfrican 15674
raceHispanic 15954
raceAmericanIndianDNative 13782
raceOther 15085
raceUnknown 14651
msMarried 15991
msDivorcedseparatedorwidowed 15463
msOther 14925
msUnknown 16638
histAD 15953
histLC 15372
histNSCLC_NOS 16048
histOTH 16148
histSC 16634
stageadvanced 16113
stageUnknown 16483
ph 15448
surgery 16755
radiation 16480
immuno 15358
chemo 15129
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot (fit_adjusted_noME,type= "trace" )
plot (fit_adjusted_noME,variable= "b_pred_pm_mean_adjusted" )
Family: bernoulli
Links: mu = logit
Formula: binary_outcome ~ me(pred_pm_mean_naive, pred_pm_sd_naive) + sex + age + race + ms + hist + stage + ph + surgery + radiation + immuno + chemo
Data: df_analysis (Number of observations: 4391)
Draws: 8 chains, each with iter = 5000; warmup = 2500; thin = 1;
total post-warmup draws = 20000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat
Intercept -2.20 0.36 -2.89 -1.51 1.00
sexMale 0.27 0.07 0.12 0.42 1.00
age 0.02 0.00 0.01 0.03 1.00
raceAsianDPacificIslander -0.39 0.09 -0.57 -0.22 1.00
raceBlackDAfrican -0.21 0.19 -0.59 0.16 1.00
raceHispanic 0.13 0.15 -0.16 0.41 1.00
raceAmericanIndianDNative 0.74 1.23 -1.59 3.25 1.00
raceOther 0.01 0.16 -0.30 0.31 1.00
raceUnknown -0.15 0.20 -0.54 0.25 1.00
msMarried -0.08 0.11 -0.29 0.14 1.00
msDivorcedseparatedorwidowed -0.06 0.13 -0.32 0.19 1.00
msOther 0.10 0.37 -0.63 0.82 1.00
msUnknown 1.31 0.19 0.94 1.68 1.00
histAD -0.63 0.11 -0.84 -0.42 1.00
histLC -0.26 0.34 -0.92 0.40 1.00
histNSCLC_NOS -0.28 0.16 -0.59 0.03 1.00
histOTH -0.21 0.15 -0.50 0.08 1.00
histSC 0.41 0.18 0.06 0.76 1.00
stageadvanced 1.23 0.09 1.05 1.40 1.00
stageUnknown 0.24 0.17 -0.11 0.58 1.00
ph 0.35 0.14 0.08 0.61 1.00
surgery -1.44 0.11 -1.67 -1.23 1.00
radiation -0.07 0.08 -0.23 0.08 1.00
immuno -0.37 0.12 -0.61 -0.13 1.00
chemo 0.07 0.08 -0.09 0.23 1.00
mepred_pm_mean_naivepred_pm_sd_naive 0.04 0.02 -0.00 0.08 1.00
Bulk_ESS Tail_ESS
Intercept 17214 15448
sexMale 27457 15011
age 25075 15695
raceAsianDPacificIslander 23978 15303
raceBlackDAfrican 27066 15259
raceHispanic 26054 14535
raceAmericanIndianDNative 29865 13499
raceOther 24671 15047
raceUnknown 25672 14441
msMarried 13792 13636
msDivorcedseparatedorwidowed 14249 14593
msOther 24053 15139
msUnknown 16763 15056
histAD 13718 14789
histLC 24491 15284
histNSCLC_NOS 16780 15325
histOTH 16900 15723
histSC 19063 15692
stageadvanced 19970 15932
stageUnknown 24739 15571
ph 27897 14846
surgery 20338 15754
radiation 23052 15349
immuno 27080 15697
chemo 24825 15849
mepred_pm_mean_naivepred_pm_sd_naive 12066 13234
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot (fit_naive,type= "trace" )
plot (fit_naive,variable= "bsp_mepred_pm_mean_naivepred_pm_sd_naive" )
Family: bernoulli
Links: mu = logit
Formula: binary_outcome ~ me(pred_pm_mean_adjusted, pred_pm_sd_adjusted) + sex + age + race + ms + hist + stage + ph + surgery + radiation + immuno + chemo
Data: df_analysis (Number of observations: 4391)
Draws: 8 chains, each with iter = 5000; warmup = 2500; thin = 1;
total post-warmup draws = 20000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI
Intercept -2.20 0.36 -2.91 -1.49
sexMale 0.27 0.07 0.12 0.42
age 0.02 0.00 0.01 0.03
raceAsianDPacificIslander -0.39 0.09 -0.57 -0.22
raceBlackDAfrican -0.21 0.19 -0.59 0.16
raceHispanic 0.13 0.14 -0.16 0.41
raceAmericanIndianDNative 0.73 1.22 -1.60 3.18
raceOther 0.01 0.16 -0.30 0.31
raceUnknown -0.14 0.20 -0.53 0.25
msMarried -0.08 0.11 -0.30 0.14
msDivorcedseparatedorwidowed -0.06 0.13 -0.32 0.19
msOther 0.10 0.37 -0.63 0.82
msUnknown 1.31 0.19 0.95 1.68
histAD -0.63 0.11 -0.84 -0.42
histLC -0.26 0.34 -0.93 0.40
histNSCLC_NOS -0.28 0.16 -0.59 0.03
histOTH -0.21 0.15 -0.50 0.08
histSC 0.41 0.18 0.06 0.76
stageadvanced 1.23 0.09 1.05 1.41
stageUnknown 0.24 0.18 -0.11 0.58
ph 0.34 0.13 0.08 0.60
surgery -1.45 0.12 -1.68 -1.22
radiation -0.07 0.08 -0.23 0.08
immuno -0.37 0.12 -0.60 -0.12
chemo 0.07 0.08 -0.09 0.23
mepred_pm_mean_adjustedpred_pm_sd_adjusted 0.04 0.02 -0.00 0.09
Rhat Bulk_ESS Tail_ESS
Intercept 1.00 18446 14935
sexMale 1.00 26074 15462
age 1.00 27285 15033
raceAsianDPacificIslander 1.00 22173 14553
raceBlackDAfrican 1.00 27671 14505
raceHispanic 1.00 25143 15200
raceAmericanIndianDNative 1.00 27072 14422
raceOther 1.00 26831 14671
raceUnknown 1.00 24191 14004
msMarried 1.00 13982 14259
msDivorcedseparatedorwidowed 1.00 14973 14944
msOther 1.00 25590 15014
msUnknown 1.00 18221 16039
histAD 1.00 13348 14985
histLC 1.00 24654 14927
histNSCLC_NOS 1.00 16485 16552
histOTH 1.00 15765 14781
histSC 1.00 17726 15925
stageadvanced 1.00 19233 15044
stageUnknown 1.00 24612 15166
ph 1.00 27164 14866
surgery 1.00 20116 14138
radiation 1.00 22187 15928
immuno 1.00 28029 15368
chemo 1.00 25011 15905
mepred_pm_mean_adjustedpred_pm_sd_adjusted 1.00 13483 14022
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot (fit_adjusted,type= "trace" )
plot (fit_adjusted,variable= "bsp_mepred_pm_mean_adjustedpred_pm_sd_adjusted" )
Session info
R version 4.5.1 (2025-06-13)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[5] purrr_1.1.0 readr_2.1.5 tidyr_1.3.1 tibble_3.3.0
[9] ggplot2_3.5.2 tidyverse_2.0.0 survival_3.8-3 brms_2.22.0
[13] Rcpp_1.1.0
loaded via a namespace (and not attached):
[1] gtable_0.3.6 tensorA_0.36.2.1 xfun_0.52
[4] QuickJSR_1.8.0 htmlwidgets_1.6.4 processx_3.8.6
[7] inline_0.3.21 lattice_0.22-7 callr_3.7.6
[10] tzdb_0.5.0 ps_1.9.1 vctrs_0.6.5
[13] tools_4.5.1 generics_0.1.4 stats4_4.5.1
[16] parallel_4.5.1 pkgconfig_2.0.3 Matrix_1.7-3
[19] checkmate_2.3.2 RColorBrewer_1.1-3 distributional_0.5.0
[22] RcppParallel_5.1.10 lifecycle_1.0.4 compiler_4.5.1
[25] farver_2.1.2 Brobdingnag_1.2-9 codetools_0.2-20
[28] htmltools_0.5.8.1 bayesplot_1.13.0 yaml_2.3.10
[31] crayon_1.5.3 pillar_1.11.0 StanHeaders_2.32.10
[34] bridgesampling_1.1-2 abind_1.4-8 nlme_3.1-168
[37] posterior_1.6.1 rstan_2.32.7 tidyselect_1.2.1
[40] digest_0.6.37 mvtnorm_1.3-3 stringi_1.8.7
[43] reshape2_1.4.4 labeling_0.4.3 splines_4.5.1
[46] fastmap_1.2.0 grid_4.5.1 cli_3.6.5
[49] magrittr_2.0.3 loo_2.8.0 dichromat_2.0-0.1
[52] pkgbuild_1.4.8 withr_3.0.2 scales_1.4.0
[55] backports_1.5.0 bit64_4.6.0-1 timechange_0.3.0
[58] rmarkdown_2.29 matrixStats_1.5.0 bit_4.6.0
[61] gridExtra_2.3 hms_1.1.3 coda_0.19-4.1
[64] evaluate_1.0.4 knitr_1.50 rstantools_2.4.0
[67] rlang_1.1.6 glue_1.8.0 vroom_1.6.5
[70] rstudioapi_0.17.1 jsonlite_2.0.0 plyr_1.8.9
[73] R6_2.6.1